Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Molecules ; 29(7)2024 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-38611856

RESUMO

SARS-CoV-2 is the virus responsible for a respiratory disease called COVID-19 that devastated global public health. Since 2020, there has been an intense effort by the scientific community to develop safe and effective prophylactic and therapeutic agents against this disease. In this context, peptides have emerged as an alternative for inhibiting the causative agent. However, designing peptides that bind efficiently is still an open challenge. Here, we show an algorithm for peptide engineering. Our strategy consists of starting with a peptide whose structure is similar to the interaction region of the human ACE2 protein with the SPIKE protein, which is important for SARS-COV-2 infection. Our methodology is based on a genetic algorithm performing systematic steps of random mutation, protein-peptide docking (using the PyRosetta library) and selecting the best-optimized peptides based on the contacts made at the peptide-protein interface. We performed three case studies to evaluate the tool parameters and compared our results with proposals presented in the literature. Additionally, we performed molecular dynamics (MD) simulations (three systems, 200 ns each) to probe whether our suggested peptides could interact with the spike protein. Our results suggest that our methodology could be a good strategy for designing peptides.


Assuntos
COVID-19 , Glicoproteína da Espícula de Coronavírus , Humanos , SARS-CoV-2 , Peptídeos/farmacologia
2.
PLoS Comput Biol ; 19(12): e1011679, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38127831

RESUMO

The article presents a framework for a Bioinformatics competition that focuses on 4 key aspects: structure, model, overview, and perspectives. Structure represents the organizational framework employed to coordinate the main tasks involved in the competition. Model showcases the competition design, which encompasses 3 phases. Overview presents our case study, the League of Brazilian Bioinformatics (LBB) 2nd Edition. Finally, the section on perspectives provides a brief discussion of the LBB 2nd Edition, along with insights and feedback from participants. LBB is a biannual team competition launched in 2019 to promote the ongoing training of human resources in Bioinformatics and Computational Biology in Brazil. LBB aims to stimulate ongoing training in Bioinformatics by encouraging participation in competitions, promoting the organization of future Bioinformatics competitions, and fostering the integration of the Bioinformatics and Computational Biology community in the country, as well as collaboration among participants. The LBB 2nd Edition was launched in 2021 and featured 251 competitors forming 91 teams. Knowledge competitions promote learning, collaboration, and innovation, which are crucial for advancing scientific knowledge and solving real-world problems. In summary, this article serves as a valuable resource for individuals and organizations interested in developing knowledge competitions, offering a model based on our experience with LBB to benefit all levels of Bioinformatics trainees.


Assuntos
Biologia Computacional , Humanos , Brasil , Biologia Computacional/educação
3.
J Chem Inf Model ; 62(18): 4300-4318, 2022 09 26.
Artigo em Inglês | MEDLINE | ID: mdl-36102784

RESUMO

Machine learning-based drug discovery success depends on molecular representation. Yet traditional molecular fingerprints omit both the protein and pointers back to structural information that would enable better model interpretability. Therefore, we propose LUNA, a Python 3 toolkit that calculates and encodes protein-ligand interactions into new hashed fingerprints inspired by Extended Connectivity FingerPrint (ECFP): EIFP (Extended Interaction FingerPrint), FIFP (Functional Interaction FingerPrint), and Hybrid Interaction FingerPrint (HIFP). LUNA also provides visual strategies to make the fingerprints interpretable. We performed three major experiments exploring the fingerprints' use. First, we trained machine learning models to reproduce DOCK3.7 scores using 1 million docked Dopamine D4 complexes. We found that EIFP-4,096 performed (R2 = 0.61) superior to related molecular and interaction fingerprints. Second, we used LUNA to support interpretable machine learning models. Finally, we demonstrate that interaction fingerprints can accurately identify similarities across molecular complexes that other fingerprints overlook. Hence, we envision LUNA and its interface fingerprints as promising methods for machine learning-based virtual screening campaigns. LUNA is freely available at https://github.com/keiserlab/LUNA.


Assuntos
Dopamina , Proteínas , Descoberta de Drogas/métodos , Ligantes , Aprendizado de Máquina , Proteínas/química
4.
Nucleic Acids Res ; 50(W1): W392-W397, 2022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35524575

RESUMO

Proteins are essential macromolecules for the maintenance of living systems. Many of them perform their function by interacting with other molecules in regions called binding sites. The identification and characterization of these regions are of fundamental importance to determine protein function, being a fundamental step in processes such as drug design and discovery. However, identifying such binding regions is not trivial due to the drawbacks of experimental methods, which are costly and time-consuming. Here we propose GRaSP-web, a web server that uses GRaSP (Graph-based Residue neighborhood Strategy to Predict binding sites), a residue-centric method based on graphs that uses machine learning to predict putative ligand binding site residues. The method outperformed 6 state-of-the-art residue-centric methods (MCC of 0.61). Also, GRaSP-web is scalable as it takes 10-20 seconds to predict binding sites for a protein complex (the state-of-the-art residue-centric method takes 2-5h on the average). It proved to be consistent in predicting binding sites for bound/unbound structures (MCC 0.61 for both) and for a large dataset of multi-chain proteins (4500 entries, MCC 0.61). GRaSPWeb is freely available at https://grasp.ufv.br.


Assuntos
Aprendizado de Máquina , Proteínas , Proteínas/química , Sítios de Ligação , Ligantes , Domínios Proteicos , Ligação Proteica
5.
BMC Bioinformatics ; 22(1): 1, 2021 Jan 02.
Artigo em Inglês | MEDLINE | ID: mdl-33388027

RESUMO

BACKGROUND: Protein-peptide interactions play a fundamental role in a wide variety of biological processes, such as cell signaling, regulatory networks, immune responses, and enzyme inhibition. Peptides are characterized by low toxicity and small interface areas; therefore, they are good targets for therapeutic strategies, rational drug planning and protein inhibition. Approximately 10% of the ethical pharmaceutical market is protein/peptide-based. Furthermore, it is estimated that 40% of protein interactions are mediated by peptides. Despite the fast increase in the volume of biological data, particularly on sequences and structures, there remains a lack of broad and comprehensive protein-peptide databases and tools that allow the retrieval, characterization and understanding of protein-peptide recognition and consequently support peptide design. RESULTS: We introduce Propedia, a comprehensive and up-to-date database with a web interface that permits clustering, searching and visualizing of protein-peptide complexes according to varied criteria. Propedia comprises over 19,000 high-resolution structures from the Protein Data Bank including structural and sequence information from protein-peptide complexes. The main advantage of Propedia over other peptide databases is that it allows a more comprehensive analysis of similarity and redundancy. It was constructed based on a hybrid clustering algorithm that compares and groups peptides by sequences, interface structures and binding sites. Propedia is available through a graphical, user-friendly and functional interface where users can retrieve, and analyze complexes and download each search data set. We performed case studies and verified that the utility of Propedia scores to rank promissing interacting peptides. In a study involving predicting peptides to inhibit SARS-CoV-2 main protease, we showed that Propedia scores related to similarity between different peptide complexes with SARS-CoV-2 main protease are in agreement with molecular dynamics free energy calculation. CONCLUSIONS: Propedia is a database and tool to support structure-based rational design of peptides for special purposes. Protein-peptide interactions can be useful to predict, classifying and scoring complexes or for designing new molecules as well. Propedia is up-to-date as a ready-to-use webserver with a friendly and resourceful interface and is available at: https://bioinfo.dcc.ufmg.br/propedia.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Proteínas , Peptídeos/química , Proteínas/química , Algoritmos , Humanos
6.
Front Bioinform ; 1: 711463, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-36303729

RESUMO

Bioinformatics is a fast-evolving research field, requiring effective educational initiatives to bring computational knowledge to Life Sciences. Since 2017, an organizing committee composed of graduate students and postdoctoral researchers from the Universidade Federal de Minas Gerais (Brazil) promotes a week-long event named Summer Course in Bioinformatics (CVBioinfo). This event aims to diffuse bioinformatic principles, news, and methods mainly focused on audiences of undergraduate students. Furthermore, as the advent of the COVID-19 global pandemic has precluded in-person events, we offered the event in online mode, using free video transmission platforms. Herein, we present and discuss the insights obtained from promoting the Online Workshop in Bioinformatics (WOB) organized in November 2020, comparing it to our experience in previous in-person editions of the same event.

7.
Bioinformatics ; 36(Suppl_2): i726-i734, 2020 12 30.
Artigo em Inglês | MEDLINE | ID: mdl-33381849

RESUMO

MOTIVATION: The discovery of protein-ligand-binding sites is a major step for elucidating protein function and for investigating new functional roles. Detecting protein-ligand-binding sites experimentally is time-consuming and expensive. Thus, a variety of in silico methods to detect and predict binding sites was proposed as they can be scalable, fast and present low cost. RESULTS: We proposed Graph-based Residue neighborhood Strategy to Predict binding sites (GRaSP), a novel residue centric and scalable method to predict ligand-binding site residues. It is based on a supervised learning strategy that models the residue environment as a graph at the atomic level. Results show that GRaSP made compatible or superior predictions when compared with methods described in the literature. GRaSP outperformed six other residue-centric methods, including the one considered as state-of-the-art. Also, our method achieved better results than the method from CAMEO independent assessment. GRaSP ranked second when compared with five state-of-the-art pocket-centric methods, which we consider a significant result, as it was not devised to predict pockets. Finally, our method proved scalable as it took 10-20 s on average to predict the binding site for a protein complex whereas the state-of-the-art residue-centric method takes 2-5 h on average. AVAILABILITY AND IMPLEMENTATION: The source code and datasets are available at https://github.com/charles-abreu/GRaSP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas , Software , Sítios de Ligação , Força da Mão , Ligantes
8.
BMC Bioinformatics ; 21(1): 275, 2020 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-32611389

RESUMO

BACKGROUND: Protein engineering has many applications for industry, such as the development of new drugs, vaccines, treatment therapies, food, and biofuel production. A common way to engineer a protein is to perform mutations in functionally essential residues to optimize their function. However, the discovery of beneficial mutations for proteins is a complex task, with a time-consuming and high cost for experimental validation. Hence, computational approaches have been used to propose new insights for experiments narrowing the search space and reducing the costs. RESULTS: In this study, we developed Proteus (an acronym for Protein Engineering Supporter), a new algorithm for proposing mutation pairs in a target 3D structure. These suggestions are based on contacts observed in other known structures from Protein Data Bank (PDB). Proteus' basic assumption is that if a non-interacting pair of amino acid residues in the target structure is exchanged to an interacting pair, this could enhance protein stability. This trade is only allowed if the main-chain conformation of the residues involved in the contact is conserved. Furthermore, no steric impediment is expected between the proposed mutations and the surrounding protein atoms. To evaluate Proteus, we performed two case studies with proteins of industrial interests. In the first case study, we evaluated if the mutations suggested by Proteus for four protein structures enhance the number of inter-residue contacts. Our results suggest that most mutations proposed by Proteus increase the number of interactions into the protein. In the second case study, we used Proteus to suggest mutations for a lysozyme protein. Then, we compared Proteus' outcomes to mutations with available experimental evidence reported in the ProTherm database. Four mutations, in which our results agree with the experimental data, were found. This could be initial evidence that changes in the side-chain of some residues do not cause disturbances that harm protein structure stability. CONCLUSION: We believe that Proteus could be used combined with other methods to give new insights into the rational development of engineered proteins. Proteus user-friendly web-based tool is available at < http://proteus.dcc.ufmg.br >.


Assuntos
Proteínas/química , Interface Usuário-Computador , Algoritmos , Bases de Dados de Proteínas , Muramidase/química , Muramidase/genética , Muramidase/metabolismo , Mutagênese , Engenharia de Proteínas/métodos , Estrutura Terciária de Proteína , Proteínas/genética , Proteínas/metabolismo
9.
BMC Bioinformatics ; 21(Suppl 2): 80, 2020 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-32164574

RESUMO

BACKGROUND: Interactions between proteins and non-proteic small molecule ligands play important roles in the biological processes of living systems. Thus, the development of computational methods to support our understanding of the ligand-receptor recognition process is of fundamental importance since these methods are a major step towards ligand prediction, target identification, lead discovery, and more. This article presents visGReMLIN, a web server that couples a graph mining-based strategy to detect motifs at the protein-ligand interface with an interactive platform to visually explore and interpret these motifs in the context of protein-ligand interfaces. RESULTS: To illustrate the potential of visGReMLIN, we conducted two cases in which our strategy was compared with previous experimentally and computationally determined results. visGReMLIN allowed us to detect patterns previously documented in the literature in a totally visual manner. In addition, we found some motifs that we believe are relevant to protein-ligand interactions in the analyzed datasets. CONCLUSIONS: We aimed to build a visual analytics-oriented web server to detect and visualize common motifs at the protein-ligand interface. visGReMLIN motifs can support users in gaining insights on the key atoms/residues responsible for protein-ligand interactions in a dataset of complexes.


Assuntos
Ligantes , Proteínas/metabolismo , Interface Usuário-Computador , Humanos , Ligação de Hidrogênio , Interações Hidrofóbicas e Hidrofílicas , Ligação Proteica , Proteínas/química
10.
IEEE/ACM Trans Comput Biol Bioinform ; 17(4): 1317-1328, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-30629512

RESUMO

Essential roles in biological systems depend on protein-ligand recognition, which is mostly driven by specific non-covalent interactions. Consequently, investigating these interactions contributes to understanding how molecular recognition occurs. Nowadays, a large-scale data set of protein-ligand complexes is available in the Protein Data Bank, what led several tools to be proposed as an effort to elucidate protein-ligand interactions. Nonetheless, there is not an all-in-one tool that couples large-scale statistical, visual, and interactive analysis of conserved protein-ligand interactions. Therefore, we propose nAPOLI (Analysis of PrOtein-Ligand Interactions), a web server that combines large-scale analysis of conserved interactions in protein-ligand complexes at the atomic-level, interactive visual representations, and comprehensive reports of the interacting residues/atoms to detect and explore conserved non-covalent interactions. We demonstrate the potential of nAPOLI in detecting important conserved interacting residues through four case studies: two involving a human cyclin-dependent kinase 2 (CDK2), one related to ricin, and other to the human nuclear receptor subfamily 3 (hNR3). nAPOLI proved to be suitable to identify conserved interactions according to literature, as well as highlight additional interactions. Finally, we illustrate, with a virtual screening ligand selection, how nAPOLI can be widely applied in structural biology and drug design. nAPOLI is freely available at bioinfo.dcc.ufmg.br/napoli/.


Assuntos
Biologia Computacional/métodos , Visualização de Dados , Proteínas , Algoritmos , Análise por Conglomerados , Bases de Dados de Proteínas , Humanos , Ligantes , Modelos Moleculares , Ligação Proteica , Proteínas/química , Proteínas/metabolismo
11.
BMC Bioinformatics ; 18(Suppl 10): 403, 2017 Sep 13.
Artigo em Inglês | MEDLINE | ID: mdl-28929973

RESUMO

BACKGROUND: A huge amount of data about genomes and sequence variation is available and continues to grow on a large scale, which makes experimentally characterizing these mutations infeasible regarding disease association and effects on protein structure and function. Therefore, reliable computational approaches are needed to support the understanding of mutations and their impacts. Here, we present VERMONT 2.0, a visual interactive platform that combines sequence and structural parameters with interactive visualizations to make the impact of protein point mutations more understandable. RESULTS: We aimed to contribute a novel visual analytics oriented method to analyze and gain insight on the impact of protein point mutations. To assess the ability of VERMONT to do this, we visually examined a set of mutations that were experimentally characterized to determine if VERMONT could identify damaging mutations and why they can be considered so. CONCLUSIONS: VERMONT allowed us to understand mutations by interpreting position-specific structural and physicochemical properties. Additionally, we note some specific positions we believe have an impact on protein function/structure in the case of mutation.


Assuntos
Análise Mutacional de DNA/métodos , Software , Sequência de Aminoácidos , Sequência Conservada/genética , Humanos , Interações Hidrofóbicas e Hidrofílicas , Mutação/genética , Polimorfismo de Nucleotídeo Único/genética , Alinhamento de Sequência , Proteína Supressora de Tumor p53/química , Proteína Supressora de Tumor p53/genética
12.
Bioinformatics ; 31(17): 2894-6, 2015 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-25910698

RESUMO

UNLABELLED: PDBest (PDB Enhanced Structures Toolkit) is a user-friendly, freely available platform for acquiring, manipulating and normalizing protein structures in a high-throughput and seamless fashion. With an intuitive graphical interface it allows users with no programming background to download and manipulate their files. The platform also exports protocols, enabling users to easily share PDB searching and filtering criteria, enhancing analysis reproducibility. AVAILABILITY AND IMPLEMENTATION: PDBest installation packages are freely available for several platforms at http://www.pdbest.dcc.ufmg.br CONTACT: wellisson@dcc.ufmg.br, dpires@dcc.ufmg.br, raquelcm@dcc.ufmg.br SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Bases de Dados de Proteínas , Proteínas/química , Software , Interface Usuário-Computador , Gráficos por Computador , Humanos , Conformação Proteica , Reprodutibilidade dos Testes
13.
Bioinformatics ; 31(6): 864-70, 2015 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-25388152

RESUMO

MOTIVATION: Currently, 25% of proteins annotated in Pfam have their function unknown. One way of predicting proteins function is by looking at their active site, which has two main parts: the catalytic site and the substrate binding site. The active site is more conserved than the other residues of the protein and can be a rich source of information for protein function prediction. This article presents a new heuristic method, named genetic active site search (GASS), which searches for given active site 3D templates in unknown proteins. The method can perform non-exact amino acid matches (conservative mutations), is able to find amino acids in different chains and does not impose any restrictions on the active site size. RESULTS: GASS results were compared with those catalogued in the catalytic site atlas (CSA) in four different datasets and compared with two other methods: amino acid pattern search for substructures and motif and catalytic site identification. The results show GASS can correctly identify >90% of the templates searched. Experiments were also run using data from the substrate binding sites prediction competition CASP 10, and GASS is ranked fourth among the 18 methods considered.


Assuntos
Algoritmos , Domínio Catalítico , Bases de Dados de Proteínas , Proteínas/química , Sítios de Ligação , Simulação por Computador , Humanos , Estrutura Terciária de Proteína
14.
BMC Proc ; 8(Suppl 2 Proceedings of the 3rd Annual Symposium on Biologica): S4, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25237391

RESUMO

In this paper, we propose an interactive visualization called VERMONT which tackles the problem of visualizing mutations and infers their possible effects on the conservation of physicochemical and topological properties in protein families. More specifically, we visualize a set of structure-based sequence alignments and integrate several structural parameters that should aid biologists in gaining insight into possible consequences of mutations. VERMONT allowed us to identify patterns of position-specific properties as well as exceptions that may help predict whether specific mutations could damage protein function.

15.
Bioinformatics ; 29(7): 855-61, 2013 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-23396119

RESUMO

MOTIVATION: Receptor-ligand interactions are a central phenomenon in most biological systems. They are characterized by molecular recognition, a complex process mainly driven by physicochemical and structural properties of both receptor and ligand. Understanding and predicting these interactions are major steps towards protein ligand prediction, target identification, lead discovery and drug design. RESULTS: We propose a novel graph-based-binding pocket signature called aCSM, which proved to be efficient and effective in handling large-scale protein ligand prediction tasks. We compare our results with those described in the literature and demonstrate that our algorithm overcomes the competitor's techniques. Finally, we predict novel ligands for proteins from Trypanosoma cruzi, the parasite responsible for Chagas disease, and validate them in silico via a docking protocol, showing the applicability of the method in suggesting ligands for pockets in a real-world scenario. AVAILABILITY AND IMPLEMENTATION: Datasets and the source code are available at http://www.dcc.ufmg.br/∼dpires/acsm. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Ligantes , Proteínas/química , Sítios de Ligação , Enzimas/química , Enzimas/metabolismo , Humanos , Modelos Moleculares , Conformação Molecular , Simulação de Acoplamento Molecular , Ligação Proteica , Conformação Proteica , Proteínas/metabolismo , Proteínas de Protozoários/química , Proteínas de Protozoários/metabolismo , Trypanosoma cruzi
16.
J Biol Chem ; 286(31): 27399-405, 2011 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-21632536

RESUMO

The exponential increase in genome sequencing output has led to the accumulation of thousands of predicted genes lacking a proper functional annotation. Among this mass of hypothetical proteins, enzymes catalyzing new reactions or using novel ways to catalyze already known reactions might still wait to be identified. Here, we provide a structural and biochemical characterization of the 3-keto-5-aminohexanoate cleavage enzyme (Kce), an enzymatic activity long known as being involved in the anaerobic fermentation of lysine but whose catalytic mechanism has remained elusive so far. Although the enzyme shows the ubiquitous triose phosphate isomerase (TIM) barrel fold and a Zn(2+) cation reminiscent of metal-dependent class II aldolases, our results based on a combination of x-ray snapshots and molecular modeling point to an unprecedented mechanism that proceeds through deprotonation of the 3-keto-5-aminohexanoate substrate, nucleophilic addition onto an incoming acetyl-CoA, intramolecular transfer of the CoA moiety, and final retro-Claisen reaction leading to acetoacetate and 3-aminobutyryl-CoA. This model also accounts for earlier observations showing the origin of carbon atoms in the products, as well as the absence of detection of any covalent acyl-enzyme intermediate. Kce is the first representative of a large family of prokaryotic hypothetical proteins, currently annotated as the "domain of unknown function" DUF849.


Assuntos
Oxo-Ácido-Liases/metabolismo , Catálise , Cristalografia por Raios X , Modelos Moleculares , Oxo-Ácido-Liases/química , Conformação Proteica , Dobramento de Proteína , Especificidade por Substrato
17.
BMC Genomics ; 12 Suppl 4: S12, 2011 Dec 22.
Artigo em Inglês | MEDLINE | ID: mdl-22369665

RESUMO

BACKGROUND: The unforgiving pace of growth of available biological data has increased the demand for efficient and scalable paradigms, models and methodologies for automatic annotation. In this paper, we present a novel structure-based protein function prediction and structural classification method: Cutoff Scanning Matrix (CSM). CSM generates feature vectors that represent distance patterns between protein residues. These feature vectors are then used as evidence for classification. Singular value decomposition is used as a preprocessing step to reduce dimensionality and noise. The aspect of protein function considered in the present work is enzyme activity. A series of experiments was performed on datasets based on Enzyme Commission (EC) numbers and mechanistically different enzyme superfamilies as well as other datasets derived from SCOP release 1.75. RESULTS: CSM was able to achieve a precision of up to 99% after SVD preprocessing for a database derived from manually curated protein superfamilies and up to 95% for a dataset of the 950 most-populated EC numbers. Moreover, we conducted experiments to verify our ability to assign SCOP class, superfamily, family and fold to protein domains. An experiment using the whole set of domains found in last SCOP version yielded high levels of precision and recall (up to 95%). Finally, we compared our structural classification results with those in the literature to place this work into context. Our method was capable of significantly improving the recall of a previous study while preserving a compatible precision level. CONCLUSIONS: We showed that the patterns derived from CSMs could effectively be used to predict protein function and thus help with automatic function annotation. We also demonstrated that our method is effective in structural classification tasks. These facts reinforce the idea that the pattern of inter-residue distances is an important component of family structural signatures. Furthermore, singular value decomposition provided a consistent increase in precision and recall, which makes it an important preprocessing step when dealing with noisy data.


Assuntos
Enzimas/metabolismo , Software , Bases de Dados de Proteínas , Enzimas/química , Enzimas/classificação , Dobramento de Proteína , Estrutura Terciária de Proteína
18.
Bioinformatics ; 26(24): 3075-82, 2010 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-20980272

RESUMO

MOTIVATION: Current computational approaches to function prediction are mostly based on protein sequence classification and transfer of annotation from known proteins to their closest homologous sequences relying on the orthology concept of function conservation. This approach suffers a major weakness: annotation reliability depends on global sequence similarity to known proteins and is poorly efficient for enzyme superfamilies that catalyze different reactions. Structural biology offers a different strategy to overcome the problem of annotation by adding information about protein 3D structures. This information can be used to identify amino acids located in active sites, focusing on detection of functional polymorphisms residues in an enzyme superfamily. Structural genomics programs are providing more and more novel protein structures at a high-throughput rate. However, there is still a huge gap between the number of sequences and available structures. Computational methods, such as homology modeling provides reliable approaches to bridge this gap and could be a new precise tool to annotate protein functions. RESULTS: Here, we present Active Sites Modeling and Clustering (ASMC) method, a novel unsupervised method to classify sequences using structural information of protein pockets. ASMC combines homology modeling of family members, structural alignment of modeled active sites and a subsequent hierarchical conceptual classification. Comparison of profiles obtained from computed clusters allows the identification of residues correlated to subfamily function divergence, called specificity determining positions. ASMC method has been validated on a benchmark of 42 Pfam families for which previous resolved holo-structures were available. ASMC was also applied to several families containing known protein structures and comprehensive functional annotations. We will discuss how ASMC improves annotation and understanding of protein families functions by giving some specific illustrative examples on nucleotidyl cyclases, protein kinases and serine proteases. AVAILABILITY: http://www.genoscope.fr/ASMC/.


Assuntos
Proteínas/classificação , Análise de Sequência de Proteína/métodos , Domínio Catalítico , Análise por Conglomerados , Biologia Computacional/métodos , Enzimas/classificação , Modelos Biológicos , Anotação de Sequência Molecular , Fósforo-Oxigênio Liases/química , Proteínas Quinases/química , Proteínas/química , Proteínas/metabolismo , Alinhamento de Sequência , Serina Proteases/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...